Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction
نویسندگان
چکیده
Bankruptcy prediction has been a subject of interests for almost a century and it still ranks high among hottest topics in economics. The aim of predicting financial distress is to develop a predictive model that combines various econometric measures and allows to foresee a financial condition of a firm. In this domain various methods were proposed that were based on statistical hypothesis testing, statistical modelling (e.g., generalized linear models), and recently artificial intelligence (e.g., neural networks, Support Vector Machines, decision tress). In this paper, we propose a novel approach for bankruptcy prediction that utilizes Extreme Gradient Boosting for learning an ensemble of decision trees. Additionally, in order to reflect higher-order statistics in data and impose a prior knowledge about data representation, we introduce a new concept that we refer as to synthetic features. A synthetic feature is a combination of the econometric measures using arithmetic operations (addition, subtraction, multiplication, division). Each synthetic feature can be seen as a single regression model that is developed in an evolutionary manner. We evaluate our solution using the collected data about Polish companies in five tasks corresponding to the bankruptcy prediction in the 1st, 2nd, 3rd, 4th, and 5th year. We compare our approach with the reference methods. ∗Corresponding author, Tel.: (+48) 71 320 44 53. Email addresses: [email protected] (Maciej Zięba ), [email protected] (Sebastian K. Tomczak), [email protected] (Jakub M. Tomczak) Preprint submitted to Expert Systems with Applications April 1, 2016
منابع مشابه
Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring
Previous studies about ensembles of classifiers for bankruptcy prediction and credit scoring have been presented. In these studies, different ensemble schemes for complex classifiers were applied, and the best results were obtained using the Random Subspace method. The Bagging scheme was one of the ensemble methods used in the comparison. However, it was not correctly used. It is very important...
متن کاملA Genetic Algorithm-Based Heterogeneous Random Subspace Ensemble Model for Bankruptcy Prediction
Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble techniques are known to be very useful in improving the generalization ability of a classifier. The random subspace ensemble technique is a simple but effective method of constructing ensemble classifiers, in which some features are randomly d...
متن کاملComparing ensembles of decision trees and neural networks for one-day-ahead streamflow prediction
Ensemble learning methods have received remarkable attention in the recent years and led to considerable advancement in the performance of the regression and classification problems. Bagging and boosting are among the most popular ensemble learning techniques proposed to reduce the prediction error of learning machines. In this study, bagging and gradient boosting algorithms are incorporated in...
متن کاملApplication of Genetic Algorithm in Development of Bankruptcy Predication Theory Case Study: Companies Listed on Tehran Stock Exchange
The bankruptcy prediction models have long been proposedas a key subject in finance. The present study, therefore, makes aneffort to examine the corporate bankruptcy prediction through employmentof the genetic algorithm model. Furthermore, it attempts to evaluatethe strategies to overcome the drawbacks of ordinary methods forbankruptcy prediction through application of genetic algorithms. Thesa...
متن کاملBankruptcy Prediction by Supervised Machine Learning Techniques : A Comparative Study
It is very important for financial institutions which are capable of accurately predicting business failure. In literature, numbers of bankruptcy prediction models have been developed based on statistical and machine learning techniques. In particular, many machine learning techniques, such as neural networks, decision trees, etc. have shown better prediction performances than statistical ones....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Expert Syst. Appl.
دوره 58 شماره
صفحات -
تاریخ انتشار 2016